Source | # of sentences | Average logarithmic rank |
---|---|---|
Mga Wika ng Pilipinas | 20 | 4.50 |
Lungsod ng Maynila | 24 | 4.69 |
Bataan | 15 | 4.70 |
Kabite | 15 | 4.73 |
Kalibo, Aklan | 11 | 4.73 |
Filipino American | 13 | 4.74 |
Wikang Tagalog | 24 | 4.74 |
Mga lalawigan ng Pilipinas | 20 | 4.78 |
Mehiko | 14 | 4.80 |
Totoong Simbahan ni Hesus | 13 | 4.81 |
Mindanao | 30 | 4.81 |
Yuko Nakazawa | 12 | 4.84 |
Navotas | 13 | 4.86 |
Kongreso ng Pilipinas | 11 | 4.88 |
Barangay | 21 | 4.88 |
Ekonomiya ng Pilipinas | 11 | 4.89 |
Mga lungsod ng Pilipinas | 20 | 4.89 |
Kalakhang Maynila | 12 | 4.90 |
Wikang Italyano | 14 | 4.91 |
Juan Tamad | 11 | 4.92 |
Marinduque | 11 | 4.94 |
Rosario, Kabite | 14 | 4.94 |
Materya | 15 | 4.95 |
Ekwasyong kimikal | 26 | 4.96 |
Unang Republika ng Pilipinas | 15 | 4.98 |
Pormulang kimikal | 23 | 4.99 |
Antique | 21 | 5.00 |
Wika | 11 | 5.00 |
Ilang | 11 | 5.01 |
Batanes | 29 | 5.03 |
Source | # of sentences | Average logarithmic rank |
---|---|---|
Awit ng Federasyong Ruso | 15 | 7.55 |
Tala ng mga pambansang awit | 57 | 7.42 |
Ama Namin | 11 | 6.73 |
Eat Bulaga | 11 | 6.19 |
Pagsamba | 35 | 6.05 |
Uzumaki Naruto | 12 | 6.01 |
Mobile Suit Gundam SEED Destiny | 11 | 6.01 |
Parusiya | 14 | 5.97 |
Cris Villanueva | 11 | 5.81 |
Juan Escandor | 15 | 5.80 |
Zoran Ðin?i? | 21 | 5.79 |
NHK Broadcasting Center | 16 | 5.76 |
Partido ng mga Luntian | 12 | 5.76 |
Asido | 36 | 5.74 |
Tetragrammaton | 15 | 5.74 |
Henetika | 24 | 5.72 |
Espesyal na relativity | 23 | 5.69 |
Rafael del Riego | 16 | 5.69 |
Wikang Latin | 11 | 5.69 |
Paaralang Elementarya ng Nemesio I. Yabut | 11 | 5.68 |
Pagdating ng kanluranin sa asya | 13 | 5.68 |
Dietrich Bonhoeffer | 25 | 5.68 |
Michael Charleston Chua | 115 | 5.67 |
Martin Luther | 19 | 5.67 |
Francine Prieto | 12 | 5.66 |
Pandaigdigang Araw ng Kabataan | 13 | 5.66 |
Bagong Hukbong Bayan | 16 | 5.66 |
Fra Lippo Lippi (banda) | 15 | 5.66 |
The Da Vinci Code | 11 | 5.66 |
Jesus Balmori | 13 | 5.66 |
In this subsection we replace average word length by average logarithmic word rank. The logarithm of the word rank is taken because we want to punish words of high ranks only moderately.
First table:
select source, count(distinct i_s.s_id) as cnt_s, round(avg(log(w.w_id-100)),2) as av from sources so, inv_so i_s, inv_w i, words w where so.so_id=i_s.so_id and i_s.s_id=i.s_id and i.w_id=w.w_id and w.w_id>100 group by source having cnt_s>10 order by av LIMIT 30;
6.4.2.1 Average word length for different sources
6.4.2.3 Sources consisting of many / few words with frequency 1
6.4.2.4 Sources with low / high average word length of rare words